Scalable Iterative Graph Duplicate Detection

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Duplicate Pruning Strategies for Parallel A* Graph Search

In parallel A * graph search on distributed-memory machines, different processors may perform significant duplicated work if inter-processor duplicates are not pruned. The only known method for duplicate pruning associates a particular processor with each distinct node of the search space using a suitable hash function. Then duplicate nodes arising in different processors are transmitted to the...

متن کامل

Structured Duplicate Detection in External-Memory Graph Search

We consider how to use external memory, such as disk storage, to improve the scalability of heuristic search in statespace graphs. To limit the number of slow disk I/O operations, we develop a new approach to duplicate detection in graph search that localizes memory references by partitioning the search graph based on an abstraction of the state space, and expanding the frontier nodes of the gr...

متن کامل

Space and Time Scalability of Duplicate Detection in Graph Data

Duplicate detection consists in determining different representations of real-world objects in a database. Recent research has considered the use of relationships among object representations to improve duplicate detection. In the general case where relationships form a graph, research has mainly focused on duplicate detection quality/effectiveness. Scalability has been neglected so far, even t...

متن کامل

Stochastic Attributed Relational Graph Matching for Image Near-Duplicate Detection

Attributed Relational Graph (ARG) is a useful model for representing many real-world relational patterns. Computing the similarity of ARGs is a fundamental problem for ARG based modelling. This report presents a novel stochastic framework for computing the similarity of ARGs, which defines the ARG similarity as the likelihood ratio of the stochastic process that transforms one ARG to the other....

متن کامل

Duplicate document detection

In document image filing applications it is important to be able to recognize whether a particular document has already been entered into the system either as an individual document or as an inclusion in another document. Document images could be matched on the basis of layout or contents. However, matching of layout may not be effective when style is strictly controlled. We develop a document ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering

سال: 2012

ISSN: 1041-4347

DOI: 10.1109/tkde.2011.99